Automatic Chinese Abbreviation Generation Using Conditional Random Field

نویسندگان

  • Dong Yang
  • Yi-Cheng Pan
  • Sadaoki Furui
چکیده

Boulder, Colorado, June 2009. c ©2009 Association for Computational Linguistics Automatic Chinese Abbreviation Generation Using Conditional Random Field Dong Yang, Yi-cheng Pan, and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552 Japan {raymond,thomas,furui}@furui.cs.titech.ac.jp Abstract This paper presents a new method for automatically generating abbreviations for Chinese organization names. Abbreviations are commonly used in spoken Chinese, especially for organization names. The generation of Chinese abbreviation is much more complex than English abbreviations, most of which are acronyms and truncations. The abbreviation generation process is formulated as a character tagging problem and the conditional random field (CRF) is used as the tagging model. A carefully selected group of features is used in the CRF model. After generating a list of abbreviation candidates using the CRF, a length model is incorporated to re-rank the candidates. Finally the full-name and abbreviation co-occurrence information from a web search engine is utilized to further improve the performance. We achieved top-10 coverage of 88.3% by the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster based Chinese abbreviation modeling

Abbreviations in Chinese are widely observed in Chinese spoken language. Automatic generation of Chinese abbreviations helps to improve Chinese natural language understanding systems and Chinese search engine. The abbreviation generation is treated as a character-based tagging problem. Due to limited training data, Chinese abbreviation generation suffers from data sparseness. Two types of strat...

متن کامل

Vocabulary expansion through automatic abbreviation generation for Chinese voice search

Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named en...

متن کامل

A New Approach to Automatic Summarization by Using Latent Dirichlet Allocation in Conditional Random Field

A New Approach to Automatic Summarization by Using Latent Dirichlet Allocation in Conditional Random Field Xiaofeng Wu, Chengqing Zong (National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China) Abustract: In recent years, Latent Dirichlet Allocation(LDA) has been used more and more in Document Clustering, Classification, Segmentation, and some one has used it in ...

متن کامل

Chemical name recognition with harmonized feature-rich conditional random fields

This article presents a machine learning-based solution for automatic chemical and drug name recognition on scientific documents, which was applied in the BioCreative IV CHEMDNER task, namely in the chemical entity mention recognition (CEM) and the chemical document indexing (CDI) sub-tasks. The proposed approach applies conditional random fields with a rich feature set, including linguistic, o...

متن کامل

Automatic Grammatical Error Detection for Chinese based on Conditional Random Field

In the process of learning and using Chinese, foreigners may have grammatical errors due to negative migration of their native languages. Currently, the computer-oriented automatic detection method of grammatical errors is not mature enough. Based on the evaluating task ---CGED2016, we select and analyze the classification model and design feature extraction method to obtain grammatical errors ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009